AITopics | imab problem

Collaborating Authors

imab problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Solving Inverse Problem for Multi-armed Bandits via Convex Optimization

Zhu, Hao, Boedecker, Joschka

arXiv.org Artificial IntelligenceJan-31-2025

We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.

artificial intelligence, imab problem, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.18945

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Armed Bandits with Self-Information Rewards

Weinberger, Nir, Yemini, Michal

arXiv.org Artificial IntelligenceSep-6-2022

This paper introduces the informational multi-armed bandit (IMAB) model in which at each round, a player chooses an arm, observes a symbol, and receives an unobserved reward in the form of the symbol's self-information. Thus, the expected reward of an arm is the Shannon entropy of the probability mass function of the source that generates its symbols. The player aims to maximize the expected total reward associated with the entropy values of the arms played. Under the assumption that the alphabet size is known, two UCB-based algorithms are proposed for the IMAB model which consider the biases of the plug-in entropy estimator. The first algorithm optimistically corrects the bias term in the entropy estimation. The second algorithm relies on data-dependent confidence intervals that adapt to sources with small entropy values. Performance guarantees are provided by upper bounding the expected regret of each of the algorithms. Furthermore, in the Bernoulli case, the asymptotic behavior of these algorithms is compared to the Lai-Robbins lower bound for the pseudo regret. Additionally, under the assumption that the \textit{exact} alphabet size is unknown, and instead the player only knows a loose upper bound on it, a UCB-based algorithm is proposed, in which the player aims to reduce the regret caused by the unknown alphabet size in a finite time regime. Numerical results illustrating the expected regret of the algorithms presented in the paper are provided.

algorithm, alphabet size, confidence interval, (17 more...)

arXiv.org Artificial Intelligence

2209.02211

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.56)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Add feedback

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

Patil, Vishakha, Nair, Vineet, Ghalme, Ganesh, Khan, Arindam

arXiv.org Artificial IntelligenceAug-19-2022

We study the Improving Multi-Armed Bandit (IMAB) problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. In many of these applications, the time horizon is unknown to the decision-maker beforehand, which motivates the study of the IMAB problem in the technically more challenging horizon-unaware setting. We study the tension that arises between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time based on current rewards of the arms, and b) ensuring that arms with better long-term rewards get sufficient opportunities even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other in this setting. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm till it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer $\Omega(T)$ policy regret and $\Omega(k)$ competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is $O(k)$.

algorithm, cumulative reward, imab problem, (16 more...)

arXiv.org Artificial Intelligence

2208.09254

Country:

Asia > Singapore (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback